Google Trends™ and Quality of Information Analyses of Google™ Searches Pertaining to Concussion

Sports-related concussions occur with high incidence in the United States. Google Trends™ (GT) analyses indicate changes of public interest in a topic over time, and can be correlated with incidence of health events such as concussion. Internet searches represent a primary means of patient education for many health topics, including concussion; however, the quality of medical information yielded by internet searches is variable and frequently of an inappropriate reading level. This study therefore aims to describe GT over time and evaluate the quality and readability of information produced by Google™ searches of the term “concussion.” We identified a strong negative correlation from 2009 to 2016 between GT scores and total number of American high school football participants (R2 = 0.8553) and participants per school (R2 = 0.9533). Between 2004 and 2020, the monthly GT popularity score were variable (p = 3.193E-08), with September having the greatest scores, correlating with the height of American tackle football season. Applying five validated quality assessment scoring systems at two time points, it was confirmed that different sources yielded varying quality of information. Academic and non-profit healthcare sources demonstrated the highest quality metrics across two time points. There was significant variability of scores among the different scoring systems, however. The majority of searches at both time points yielded information that was rated as “fair” to “poor” in quality. Applying six readability tests, we revealed that only a single commercial website offered information written at or below the American Medical Association– recommended 6th-grade level for healthcare information. In summary, GT data analyses suggest that searches correlate with the American tackle football season and increased between 2009 and 2016, given that public interest in concussion increased and annual participation in football decreased. The quality of information yielded by Google™ searches and readability are inadequate, indicating the need for significant improvement.


Introduction
Concussion has the highest incidence of traumatic brain injuries (TBIs) and represents a major public health concern. Sports-related concussions have received much attention in the medical and lay press over the past two decades. Contact-collision sports have the highest rates of concussion and are played seasonally.
The internet is used commonly to research health and medical topics; however, the veracity of information derived from such searches has been found to vary considerably. Based on ubiquitous access to the internet, it is logical to assume that the internet is used by patients and family members to self-educate after a TBI has occurred.
In an effort to understand how the internet is used in researching concussion, we set out to study trends in internet use.
When studying trends, both infodemiology and infovelliance provide valuable insight. This type of data is already used to monitor a variety of searches, including disease outbreaks, through popular websites like Twitter and GoogleÔ. Launched in 2006, Google TrendsÔ (GT) is a tool that analyzes trends in search queries based on public interest and popularity. 1 Trillions of searches take place on Google annually, translating to several billion searches in a day. At least 90% of searches around the world take place on Google. GT provides real-time and archived information regarding Google searches from 2004 onward, and it allows users to choose based on geographical region, year, time span, and many other categories. It shows changes in interest in a given time period in any country or region for a selected term. GT data can be adjusted by year as well as by comparison between the same month across several years by using a variety of filters, such as Trending Searches, Year in Search, and Explore. The database takes into account spelling mistakes, accents, and plurality forms, ensuring a valid analysis.
Google searches have a strong correlation with current events, health and medical searches being most focused. Public reaction toward healthcare information can also be detected from GT and has been especially useful in recent outbreaks and epidemics. 2,3 Thus, we hypothesize that GT can be used to study concussion frequency and rates, because it may show relationships between internet searches and changes in year-overyear and seasonal rates of sports participation.
We also aimed to study the veracity of information provided by internet searches using the search term ''concussion.'' Sources of information available on the internet are myriad and can profoundly affect the quality of information. Broad categories include personal, professional, organizational, commercial, governmental, and educational. It is logical to assume that websites affiliated with ''authoritative'' entries, such as education and governmental organizations, will provide the most accurate and germane information; however, that assumption should be tested.
There are many tools used to objectively rate quality of information on these websites. DISCERN is an instrument that helps users determine the quality of treatment choices that the website is providing through a series of questions. 4 The HonCode Principles is one the earliest sets of quality markers for evaluating online healthcare information. 5 JAMA Benchmarks are a set of four criteria that have been set by the American Medical Association (AMA), and criteria are based on Authorship, Attribution, Currency, and Disclosure. 6 The Currency, Relevance, Authority, Accuracy, and Purpose (CRAAP) test was developed by the senior author of this article as a more comprehensive evaluation tool, with a set of criteria for each section. 7 Finally, readability is germane to knowledge acquisition. The AMA recommends that material should be written at a 6th-grade reading level to maximize understanding by the broadest segment of the U.S. population. We have chosen to use several automated scoring programs to ascertain the accessibility of information derived from internet searches.

Methods
Google Trends TM For individual search queries, GT displays the relative search volume (RSV), which is proportionate to the searches for that query over time, rather than the absolute search quantity. RSV is displayed as scores 0-100. The score is a normalized proportion of a specific query's search volume divided by the total searches for all queries in a time period, geographical location, and topic filter. This means that the score of 0 is very few searches rather than none. Moreover, if the scores are the same between two different geographical locations or two different time points, the absolute search quantity may not be the same, but the normalized proportions are equivalent.
We performed our GT search on April 30, 2021, using the Injury query of ''concussion'' with the categorical filter ''Health'' from 2004 to the present using the type of search ''web search'' and location ''United States.'' The Injury topic was used instead of the search term because our search was not concerned with searches unrelated to the concussion injury, such as searches intended for the 2015 film Concussion. Because American tackle football results in the greatest annual incidence of concussions, we aimed to ascertain whether GT data could be correlated with U.S. high school football participation. We derived appropriate statistics from the National Federation of State High School Associations' High School Participation Survey Archive. The GT analysis was replicated by reacquiring the data and reanalyzing immediately after the analysis and after figure production to ensure accuracy and reproducibility.

Quality of internet information
This study limited websites to the first page of Google searches because most seekers of health information do not go past the first page. 8 Two searches were performed in the incognito mode on August 13, 2021 with the quality analysis performed between August 13 and August 15, 2021 and on June 16, 2022 with the analysis performed between June 16 and June 19, 2022. Two searches were conducted to assess possible differences in scoring over time. Exclusion criteria included PDF documents, videos, URLs labeled as advertisements, and PowerPoint presentations.
Two independent reviewers assessed the quality of each website independently by website type. The following validated quality assessments were used: DISCERN; Health-on-The-Net (HON) Foundation code and certification status; JAMA Benchmarks; CRAAP test; and treatment content. See Appendix 1 for details of these tools. Website type was separated by broad categories: government; commercial; and academic, nonprofit healthcare sources.

Readability analysis
Each website assessed for quality of information was also analyzed for readability. According to the AMA, healthcare material should not be written above a 6th-grade reading level. 9 Six readability tests were applied: Flesch-Kincaid Reading Ease (FKRE); Flesch-Kincaid Grade Level (FKGL); Gunning Fog Index (GFI); Coleman-Liau Index (CLI); Simple Measure of Gobbledygook Index (SMOG); and Automated Readability Index (ARI). The analysis was performed with an automated analyzer available online by pasting each website's article text only into the tool (see Appendix 2).
This study was considered exempt by institutional review board review because we examined publicly available, deidentified secondary data throughout the completed GT, Quality of Internet Information, and Readability analyses.

Statistical analysis
One-way analysis of variance (ANOVA) with a = 0.05 was performed on GT RSV scores to determine the presence of a statistically significant month-specific seasonal search pattern. Pearson's correlation coefficients were determined between average yearly GT scores for ''concussion'' and annual high school participation in football and all sports obtained from the National Federation of State High School Associations. Descriptive statistics were used for each website and website classifications for the quality-of-information analysis. Cohen's kappa test was used to assess interrater reliability (two reviewers) for DISCERN, HONcode, JAMA Benchmarks, CRAAP test, and treatment score. Cohen's kappa scores were classified as excellent, good, fair, and poor for 1.00-0.75, 0.74-0.60, 0.59-0.40, and <0.4, respectively. 10 Subsequently, raters then engaged in discussion to rectify each discrepancy in scoring to determine consensus scores. Concern for a possible non-normal distribution of consensus scores led to the decision to produce descriptive statistics only. Data were analyzed using Microsoft Excel (version 16.65; Microsoft Corporation, Redmond, WA) and SPSS software (version 27; IBM Corp, Armonk, NY).

Google Trends TM
The RSV began to rise after 2009 and eventually peaked in December 2015 (100 of 100). The RSV reached a minimum in April 2004 (9 of 100). Visually, the RSV displayed a yearly peaked search interest in the late summer and early fall months. Additionally, scores fell substantially in the months after the onset of the COVID-19 Pandemic. Pearson's correlation coefficients demonstrated strong negative correlation from 2009 to 2016 between RSVs for concussion with the total number of American high school football participants (R 2 = 0.8553), whereas there was a positive correlation with all high school sports participants (R 2 = 0.9053; Fig. 1). Subsequently, from 2017 to 2019, there was a strong negative correlation between RSVs for concussion with the total number of high school football participants (R 2 = 0.9914), whereas there was a weak correlation with all high school sports    participants (R 2 = 0.5628; Fig. 2). Confirming footballrelated seasonality, one-way ANOVA regression analysis concluded that the 2004 to the 2020s mean change in month-specific popularity score is not the same ( p = 3.193E-08), and the months during the high school football season had statistically significant variability (September: p = 4.389E-05) with elevated average z-scores (Figs. 3 and 4).

Quality of information
Nine results were displayed on the first page of Google on both time points of August 13, 2021 and June 15, 2022. Cohen's kappa inter-rater reliability was excellent for each scoring metric (Tables 1 and 2). The website categories government; academic, non-profit healthcare source; and commercial each had three websites during the first analysis and two, five, and two, respectively, in the second analysis. Among the website categories in the first analysis, academic, non-profit healthcare had the highest means for all scoring tools except for the treatment score, which was highest in the commercial category. In the second analysis, academic, non-profit healthcare had the highest means for the scoring tools DISCERN, HONcode, and Treatment. The commercial category had the highest mean for JAMA Benchmarks, and government had the highest mean for CRAAP. The government category had the lowest scores for each tool except for CRAAP in both analyses (Tables 3 and 4).
There were three websites that displayed the HONcode logo in each analysis. In the first analysis, mean scores for DISCERN and JAMA Benchmarks were    higher among websites that displayed the logo whereas mean scores for CRAAP, HONcode, and Treatment scores were higher among sites that did have the logo (Table 1). In the second analysis, mean scores for DIS-CERN, JAMA Benchmarks, and CRAAP were higher among websites that did display the logo whereas mean scores for HONcode and Treatment scores were higher among sites that did not display the logo ( Table 4). The Treatment score mean for all websites was 28% in the first analysis and 26% in the second. The most commonly listed treatments were Physical Rest and Cognitive Rest in both analyses. Aerobic Treatment, Vestibulo-Oculomotor Dysfunction Management/ Treatment, and Psychosocial/Emotional Support were not listed by any website in either analysis whereas Cognitive Impairment Management/Treatment and Sleep Management/Treatment were listed only by one website each.

Readability
Readability was calculated for the same nine results in each analysis. All of the metrics have numerical outputs that indicate higher readability for lower outputs except for FKRE, which has a numerical output that indicates higher readability for higher outputs. When comparing mean scores of website categories, in both analyses the academic/non-profit/healthcare provider category's mean scores were highest for all metrics but FKRE, which was the lowest (Tables 3 and 4). The commercial category's mean scores were lowest for FKGL, GFI, SMOG, and ARI in the first analysis and were lowest in GFI and SMOG in the second analysis. The government category's mean scores were lowest for CLI and highest for FKRE in the first analysis. The government category's mean scores were highest for FKRE and lowest for FKGL, CLI, and ARI. Two websites using FKGL and three websites using ARI were at or under the recommended 6th-grade level in the first analysis. One website using FKGL and two websites using ARI were at or under the recommended level in the second analysis. No other metric ever scored at or under the recommended level in either analysis.

Google Trends TM
Our results indicate that search interest for concussions increases during the American football season in the United States. Previous studies have linked diagnosis with specific medical conditions, such as osteoarthritis, to a high likelihood of using the internet to research the condition. 12 Additionally, these patients are more likely to engage actively in their care with their physician after online research by asking questions and exploring additional treatment options. 12 This highlights the importance of patient internet searches on patient self-education and the need for research into condition-specific search patterns.
The rise in RSV in the months during the youth football season highlighted that an increase in concern or incidence of concussion may be associated with youth football. Though this study was limited to correlative relationships and could not determine causality, when annual high school participation in sports was compared to the annual average RSV from 2009 to 2019, we found that as average RSV increased, there was a correlating concomitant drop in high school football participation that did not occur in other sport categories regardless of the months in their respective seasons. 13 This suggests that youth football concussions or concern for concussions may have been a major driver for RSV seasonality that we observed.
Interestingly, a spike in RSV was found from November 2015 to January 2016, coinciding with the release of the movie Concussion on November 10, 2015. This suggests that internet searches may be driven by causes other than concern over the individual injury, such as developments in popular culture.

Quality of information
We ascertained that there was a high degree of variability in the quality of information yielded by the Google searches. Further, there was significant variance among the scoring systems applied to the search results. For example, using the average scores of the first analysis, the average score was 46% of the total possible for DISCERN despite being 80% of the total possible for CRAAP. This asymmetry highlights the importance of using multiple tools to fully evaluate an online resource. We found that academic, non-profit healthcare sources' average scores were the highest in most quality metrics in both time-point analyses, indicating that websites in this category were the best sources of higher quality information.
There are confounding results, however. Despite having the highest DISCERN average in both analyses, the average DISCERN score for the academic, nonprofit healthcare sources was found to be classified only as ''fair'' in the first analysis and ''poor'' in the second. The average score overall for each analysis was classified as ''poor'' also. This indicates that the aggregate information available from Google searches is far from optimal quality, whereas the similarly poor scores between the two time points demonstrates a lack of improvement over time.
Finally, HONcode certification seemed to have little relevance to scores given that the average score for websites displaying the logo were only higher in three of the five categories in the first analysis and two of five in the second. A previous study for online concussion information and readability also found that a majority of websites in 2012 also did not have the logo. 2

Readability
Many articles have shown that online healthcare educational information most frequently does not meet the recommended levels for the general public using a variety of scoring systems. 2,3,5 This study reaches the same conclusion about health information regarding concussions. When assessing individual search results, only 8 of 108 results meet the desired grade level across the two time-point analyses. Only two websites achieved two scores at the 6th-grade level (FKGL and ARI). A 2012 concussion information study revealed the same poor readability results, highlighting the lack in progress over the past decade. 2 In summary, our study revealed that GT analyses provide insight into the timing and frequency of searches pertaining to concussion. These correlated most strongly with seasonality of American tackle football and varied over the years based on the level of public interest in this topic. Overall, the first-page results revealed by Google searches of concussion yielded websites that offered a variable, but overall suboptimal, quality of information. Further, readability of the information associated with these websites was frequently at a higher educational grade level than recommended by the AMA. This indicates the need to improve the quality and readability of Google searchderived information pertaining to concussion.

Transparency, Rigor, and Reproducibility Summary
The study was not eligible for clinical trial registration or institutional review board review. 1 The planned analysis was not formally pre-registered, but the lead authors certify that the analysis plan was pre-specified. 2 There was a sample size of 210 monthly GT RSVs, 11 years of U.S. high school sports enrollment data, and 18 websites. GT RSVs were analyzed with correlation coefficients and ANOVA analysis of seasonality with p < 0.05. All 18 websites were screened with exclusion criteria, with analysis limited to descriptive statistics. 3 No inconsistencies in data availability or quality were discovered to justify data exclusion. 4 There were no participants who required blinding. Data analyses were performed by investigators blinded to the sport that GT data were compared with, but not the time the data reflected. During quality evaluation, two raters scored websites separately to limit bias, whereas rater agreement and summary statistics were performed by a third investigator blinded to score-specific rater identity. 5 Data were acquired between April 30, 2021 and June 19, 2022 between 8:00 AM and 9:00 PM, using computers with internet access and incognito Google search. Repeat data characteristics acquisition occurred in 10% of attempts because of user error. Data were analyzed using Microsoft Excel version 16.65 and SPSS version 27. Data were analyzed in three groups: GT RSVs, 2021 quality ratings, and 2022 quality ratings, with 0% batch analysis failure.
After appropriate data acquisition, no unexpected events occurred during the study. 6 All equipment and software used are widely available from Apple, Microsoft, and IBM. The GT RSVs, sports enrollment data, website links, and quality scoring system rubric links are available from the authors and openly accessible. 7 Key inclusion criteria are established-standards, internet-derived data research, with methodology validation including other studies analyzing Googlederived data reliability and insights such as Rovetta and colleagues (Frontiers) and Nuti and colleagues (PLoS One). Future validation will require further clinical correlation, and the primary clinical outcome's test-retest reliability is not formally determined. Key inclusion criteria and clinical outcomes were reviewed by an investigator with previous experience using Googlederived data and pediatric neurosurgery board certification. 9 Statistical tests were based on the assumption of GT self-reported normalized data sets. Potential nonnormal distribution of quality ratings and low sample size limited their analysis to descriptive statistics. Sample sizes and degrees of freedom reflect the number of independent measurements. Data characteristics review addressed any non-independence of measurements.
No missing data were noted. Data sources were not appropriate for reporting primary outcome effect sizes and confidence intervals. Statistical analysis and/or review was performed by a pediatric neurosurgeon and medical students with statistical course training in Microsoft Office and SPSS. 10 Methods used do not require correction for multiple comparisons, with original and corrected measures of statistical error rates reported in the text. 11 This report includes documentation of internal replication for GT RSV analysis. Internal quality rating replication is ongoing. 12 Data are available online and incognito Google search of ''concussion.'' 13 Analytical codes used are available only from the authors. 1